#
# numeric
# 29
# [1] "Are all rows complete?: TRUE"
# [1] "Are there any NAs?: FALSE"
# [1] "Are any values negative?: FALSE"
The heatmap below is a representation of the data with values shown in color according to magnitude. Mouse hover for column names.
The column means and medians are presented in combined heatmap and lineplot below.
The correlation between two random variables is a measure of a specific type of dependence that involves not only the two variables themselves but also a random component. It measures to what degree a linear relationship exists between then two random variables, where 1 is corresponds to a direct linear relationship, 0 corresponds to no linear relationship, and -1 corresponds to an inverse linear relationship.
For each feature column, the data are binned and a heatmap is produced with each bin colored according to count.
A pairs plot is a popular way of plotting high-dimensional data.
For every pair of dimensions are plotted showing the specific projection of the data along those two dimensions.
For readability a maximum of 8 dimensions are plotted.
An outlier is a datapoint that lives relatively far away from the bulk of other observations. Outliers can have unwanted effects on data analysis and therefore should be considered carefully.
We use the built-in method from the randomForest package in R.
The Bayesian Information Criterion is used to select the model parameters for Mclust.